智能论文笔记

Curriculum learning for improved femur fracture classification: scheduling data with prior knowledge and uncertainty

Amelia Jiménez-Sánchez , Diana Mateus , Sonja Kirchhoff , Chlodwig Kirchhoff , Peter Biberthaler , Nassir Navab , Miguel A. González Ballester , Gemma Piella

分类：计算机视觉

2020-07-31

来自X射线图像的近端股骨骨折的足够分类对于治疗选择和患者的临床结果至关重要。我们依赖于常用的AO系统，该系统描述了将图像分类为类型和亚型的分层知识树根据裂缝的位置和复杂性。在本文中，我们提出了一种基于卷积神经网络（CNN）自动分类近端股骨骨折的近端骨折分类为3和7 AO类。如已知所知，CNNS需要具有可靠标签的大型和代表性数据集，这很难收集手头的应用。在本文中，我们设计了一个课程学习（CL）方法，在这种情况下通过基本的CNNS性能提高。我们的小说配方团结了三个课程策略：单独加权培训样本，重新排序培训集，以及数据采样子集。这些策略的核心是评分函数排名训练样本。我们定义了两种小说评分函数：一个来自域的特定于域的先前知识和原始的自我节奏的不确定性分数。我们对近端股骨射线照片的临床数据集进行实验。课程改善了近端股骨骨折分类，达到了经验丰富的创伤外科医生的性能。最佳课程方法根据现有知识重新排列培训集，从而达到15％的分类提高。使用公开可用的MNIST DataSet，我们进一步讨论并展示了我们统一的CL配方对三个受控和具有挑战性的数字识别方案的好处：具有有限的数据，在类别 - 不平衡下以及在标签噪声存在下。我们的工作代码可在：https://github.com/ameliajimenez/curriculum-learning-prior -unctainty。

translated by 谷歌翻译

Detection, Explanation and Filtering of Cyber Attacks Combining Symbolic and Sub-Symbolic Methods

Anna Himmelhuber , Dominik Dold , Stephan Grimm , Sonja Zillner , Thomas Runkler

分类：机器学习

2022-12-23

Machine learning (ML) on graph-structured data has recently received deepened interest in the context of intrusion detection in the cybersecurity domain. Due to the increasing amounts of data generated by monitoring tools as well as more and more sophisticated attacks, these ML methods are gaining traction. Knowledge graphs and their corresponding learning techniques such as Graph Neural Networks (GNNs) with their ability to seamlessly integrate data from multiple domains using human-understandable vocabularies, are finding application in the cybersecurity domain. However, similar to other connectionist models, GNNs are lacking transparency in their decision making. This is especially important as there tend to be a high number of false positive alerts in the cybersecurity domain, such that triage needs to be done by domain experts, requiring a lot of man power. Therefore, we are addressing Explainable AI (XAI) for GNNs to enhance trust management by exploring combining symbolic and sub-symbolic methods in the area of cybersecurity that incorporate domain knowledge. We experimented with this approach by generating explanations in an industrial demonstrator system. The proposed method is shown to produce intuitive explanations for alerts for a diverse range of scenarios. Not only do the explanations provide deeper insights into the alerts, but they also lead to a reduction of false positive alerts by 66% and by 93% when including the fidelity metric.

translated by 谷歌翻译

Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion Scale

Hritik Bansal , Karthik Gopalakrishnan , Saket Dingliwal , Sravan Bodapati , Katrin Kirchhoff , Dan Roth

分类：自然语言处理 | 人工智能

2022-12-18

Language models have been shown to perform better with an increase in scale on a wide variety of tasks via the in-context learning paradigm. In this paper, we investigate the hypothesis that the ability of a large language model to in-context learn-perform a task is not uniformly spread across all of its underlying components. Using a 66 billion parameter language model (OPT-66B) across a diverse set of 14 downstream tasks, we find this is indeed the case: $\sim$70% of attention heads and $\sim$20% of feed forward networks can be removed with minimal decline in task performance. We find substantial overlap in the set of attention heads (un)important for in-context learning across tasks and number of in-context examples. We also address our hypothesis through a task-agnostic lens, finding that a small set of attention heads in OPT-66B score highly on their ability to perform primitive induction operations associated with in-context learning, namely, prefix matching and copying. These induction heads overlap with task-specific important heads, suggesting that induction heads are among the heads capable of more sophisticated behaviors associated with in-context learning. Overall, our study provides several insights that indicate large language models may be under-trained to perform in-context learning and opens up questions on how to pre-train language models to more effectively perform in-context learning.

translated by 谷歌翻译

An unobtrusive quality supervision approach for medical image annotation

Sonja Kunzmann , Mathias Öttl , Prathmesh Madhu , Felix Denzinger , Andreas Maier

分类：计算机视觉

2022-11-11

Image annotation is one essential prior step to enable data-driven algorithms. In medical imaging, having large and reliably annotated data sets is crucial to recognize various diseases robustly. However, annotator performance varies immensely, thus impacts model training. Therefore, often multiple annotators should be employed, which is however expensive and resource-intensive. Hence, it is desirable that users should annotate unseen data and have an automated system to unobtrusively rate their performance during this process. We examine such a system based on whole slide images (WSIs) showing lung fluid cells. We evaluate two methods the generation of synthetic individual cell images: conditional Generative Adversarial Networks and Diffusion Models (DM). For qualitative and quantitative evaluation, we conduct a user study to highlight the suitability of generated cells. Users could not detect 52.12% of generated images by DM proofing the feasibility to replace the original cells with synthetic cells without being noticed.

translated by 谷歌翻译

Personalization of CTC Speech Recognition Models using Contextual Adapters and Adaptive Boosting

Saket Dingliwal , Monica Sunkara , Sravan Bodapati , Srikanth Ronanki , Jeff Farris , Katrin Kirchhoff

分类：自然语言处理

2022-10-18

End-to-end speech recognition models trained using joint Connectionist Temporal Classification (CTC)-Attention loss have gained popularity recently. In these models, a non-autoregressive CTC decoder is often used at inference time due to its speed and simplicity. However, such models are hard to personalize because of their conditional independence assumption that prevents output tokens from previous time steps to influence future predictions. To tackle this, we propose a novel two-way approach that first biases the encoder with attention over a predefined list of rare long-tail and out-of-vocabulary (OOV) words and then uses dynamic boosting and phone alignment network during decoding to further bias the subword predictions. We evaluate our approach on open-source VoxPopuli and in-house medical datasets to showcase a 60% improvement in F1 score on domain-specific rare words over a strong CTC baseline.

translated by 谷歌翻译

Deformation equivariant cross-modality image synthesis with paired non-aligned training data

Joel Honkamaa , Umair Khan , Sonja Koivukoski , Leena Latonen , Pekka Ruusuvuori , Pekka Marttinen

分类：计算机视觉 | 机器学习

2022-08-26

跨模式图像合成是一个主动研究主题，具有多个医学临床相关的应用。最近，允许对配对但未对准数据进行培训的方法开始出现。但是，没有适用于广泛的现实世界数据集的健壮且良好的方法。在这项工作中，我们通过引入新的变形均衡性鼓励损失函数，对跨模式图像合成问题的问题提出了一个通用解决方案。该方法包括对图像合成网络的联合培训以及单独的注册网络，并允许在输入上进行对抗训练，即使使用未对准数据。这项工作通过允许对更困难的数据集进行跨模式图像合成网络的毫不费力培训来降低新的临床应用程序的标准，并为开发新的基于通用学习的跨模式注册算法开发机会。

translated by 谷歌翻译

HTML版本

Simulation-Informed Revenue Extrapolation with Confidence Estimate for Scaleup Companies Using Scarce Time-Series Data

Lele Cao , Sonja Horn , Vilhelm von Ehrenheim , Richard Anselmo Stahl , Henrik Landgren

分类：机器学习

2022-08-19

投资专业人员依靠将公司收入推送到未来（即收入预测）来近似规模的估值（高增长阶段的私人公司）并为他们的投资决定提供了信息。这项任务是手动和经验性的，使预测质量在很大程度上取决于投资专业人员的经验和见解。此外，关于规模的财务数据通常是专有，昂贵和稀缺的，排除了广泛采用数据驱动的方法。为此，我们提出了一种模拟的收入外推（SIRE）算法，该算法在小型数据集和短时间序列上产生精细颗粒的长期收入预测。父亲将收入动力学建模为线性动力学系统（LDS），该系统使用EM算法解决。主要的创新在于如何在培训和推论过程中获得嘈杂的收入测量。 Sire为在各个部门运作并提供置信度估计的规模工作。关于两项实际任务的定量实验表明，父亲大大超过了基线方法。当父亲从短时间序列中推断出来并长期预测时，我们还会观察到高性能。绩效效率的平衡和结果的解释性也得到了经验验证。从投资专业人员的角度进行评估，父亲可以精确地找到在2至5年内具有巨大潜在回报的规模。此外，我们的定性检查说明了父亲收入预测的一些有利属性。

translated by 谷歌翻译

Self-Supervised Speech Representation Learning: A Review

Abdelrahman Mohamed , Hung-yi Lee , Lasse Borgholt , Jakob D. Havtorn , Joakim Edin , Christian Igel , Katrin Kirchhoff , Shang-Wen Li , Karen Livescu , Lars Maaløe

分类：自然语言处理

2022-05-21

尽管受到监督的深度学习彻底改变了语音和音频处理，但它必须为个人任务和应用程序方案建立专业模型。同样，很难将其应用于仅可用标记数据的方言和语言。自我监督的代表学习方法承诺一个单一的通用模型，该模型将使各种各样的任务和领域受益。这种方法已显示出在自然语言处理和计算机视觉域中的成功，在减少许多下游场景所需的标签数量的同时，达到了新的性能水平。语音表示学习在三个主要类别中也经历了类似的进展：生成，对比和预测方法。其他方法依赖于多模式数据，用于预训练，将文本或视觉数据流与语音混合。尽管自我监督的语音表示仍然是一个新生的研究领域，但它与用零词汇资源的声学单词嵌入和学习密切相关，这两种资源已经进行了多年的积极研究。这篇评论介绍了自我监督的语音表示学习及其与其他研究领域的联系的方法。由于许多当前的方法仅集中在自动语音识别作为下游任务上，因此我们回顾了基准测试的最新努力，以将应用程序扩展到语音识别之外。

translated by 谷歌翻译

Domain Prompts: Towards memory and compute efficient domain adaptation of ASR systems

Saket Dingliwal , Ashish Shenoy , Sravan Bodapati , Ankur Gandhe , Ravi Teja Gadde , Katrin Kirchhoff

分类：自然语言处理 | 机器学习

2021-12-16

自动语音识别（ASR）系统已经发现它们在非常多样化的域中的众多工业应用中使用。由于域 - 特定于域的系统比域名评估的通用对应力更好，因此对内存和计算有效的域适应的需要是显而易见的。特别是，适用用于救援ASR假设的基于参数的基于变压器的语言模型是具有挑战性的。在这项工作中，我们引入域提示，一种方法，该方法列举了少数域令牌嵌入参数以将基于变压器的LM归入特定域。只需少数额外的额外参数，我们通过使用未存在的LM的基线达到7-14％的效率。尽管具有参数效率，但这些改进与具有数亿参数的完全精细调谐模型的改进相当。通过提示，数据集大小，初始化和域的消融，我们提供了在ASR系统中使用域提示的优势的证据。

translated by 谷歌翻译

Directed Speech Separation for Automatic Speech Recognition of Long Form Conversational Speech

Rohit Paturi , Sundararajan Srinivasan , Katrin Kirchhoff

分类：自然语言处理 | 机器学习

2021-12-10

言语分离的许多最近进步主要针对具有高重叠程度的短音频话语的合成混合物。这些数据集与真实的会话数据显着不同，因此，在这些数据集上培训和评估的模型不会概括到真实的会话方案。使用大多数这些模型用于长形式语音的另一个问题是由于时间频率掩模或置换不变训练（PIT）损耗的无监督聚类，因此是分离的语音段的非明确顺序。这导致准确地缝合用于自动语音识别（ASR）的下游任务的均匀扬声器段。在本文中，我们提出了一种扬声器调节分离器，在直接从混合信号中提取的扬声器嵌入物上训练。我们使用定向丢失训练此模型，该丢失调节分离的段的顺序。使用此模型，我们对真实会话数据的单词错误率（WER）进行了重大改进，而无需额外的重新拼接步骤。

translated by 谷歌翻译